Identification Of Diverse Database Subsets Using Property-Based And Fragment-Based Molecular Descriptions
نویسندگان
چکیده
This paper reports a comparison of calculated molecular properties and of 2D fragment bit-strings when used for the selection of structurally diverse subsets of a file of 44295 compounds. MaxMin dissimilarity-based selection and k-means clusterbased selection are used to select subsets containing between 1% and 20% of the file. Investigation of the numbers of bioactive molecules in the selected subsets suggest: that the MaxMin subsets are noticeably superior to the k-means subsets; that the property-based descriptors are marginally superior to the fragment-based descriptors; and that both approaches are noticeably superior to random selection.
منابع مشابه
Molecular identification of infertile bulls by using newly developed DDX3Y based on human STS markers
To determine the role of DDX3Y gene in spermatogenesis and infertility in bulls, blood samples were collected from five infertile bulls (azoospermic; no sperm in the semen) at the Animal Breeding Center in Karaj, Iran. The recommended human primers by EAA/EQMN were investigated using the BLASTn database for STS marker detection. Alignment of STS marker genes with bovine genome was performed. Pr...
متن کاملMolecular identification of Dicrocoelium dendriticum using 28s rDNA genomic marker and its histopathologic features in domestic animals in western Iran
Introduction: Dicrocoeliasis is a common disease of bile ducts and gallbladder of domestic and wild ruminants. This disease is caused by different species of dicrocoelium including Dicrocoelium dendriticum. The aim of this study was to identify pathological damages and molecular features associated with this parasite in ruminants. Materials and Methods: In this cross-sectional study, 180 fresh...
متن کاملSimilarity-based data mining in files of two-dimensional chemical structures using fingerprint measures of molecular resemblance
This paper reviews the use of measures of inter-molecular similarity for processing databases of chemical structures, which play an important role in the discovery of new drugs by the pharmaceutical industry. The similarity measures considered here are based on the use of a fingerprint representation of molecular structure, where a fingerprint is a vector encoding the presence of fragment subst...
متن کاملIdentification of Candida species isolated from vulvovaginal candidiasis using PCR-RFLP
Vulvovaginal candidiasis (VVC) is a common disease among women worldwide, therefore, accurate and rapid diagnosis of causative agents based on molecular techniques utilizing amplification of target DNA is highly recomendad for epidemiological purposes and for effective treatment. The aim of this study was to identify clinically Candida species from VVC patients by restriction fragment length po...
متن کاملSimilarity and Dissimilarity Methods for Processing Chemical Structure Databases
This paper reviews measures of similarity and dissimilarity between pairs of chemical molecules and the use of such measures for processing chemical databases. The applications discussed include similarity searching, database clustering and diversity analysis, focusing upon measures that are based on fragment bit-string occurrence data. The paper then discusses recent work on the calculation of...
متن کامل